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The growing interest for comparing protein internal dynamics owes much to the realization that 
protein function can be accompanied or assisted by structural fluctuations and conformational 
changes. Analogously to the case of functional structural elements, those aspects of protein flexi- 
bility and dynamics that are functionally oriented should be subject to evolutionary conservation. 
Accordingly, dynamics-based protein comparisons or alignments could be used to detect protein 
relationships that are more elusive to sequence and structural alignments. Here we provide an ac- 
count of the progress that has been made in recent years towards developing and applying general 
methods for comparing proteins in terms of their internal dynamics and advance the understanding 
of the structure-function relationship. 
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I. INTRODUCTION 

Over the past decades enormous efforts have been 
made to clarify the sequence — t- structure — )■ function 
relationships for proteins and enzymes. In particular 
the sequence — >■ structure connection has been exten- 
sively probed by dissecting the detailed physico-chemical 
mechanisms that assist and guide the folding process of 
several proteins [2£, .42] [53]. The more general as- 
pects of this relationship are, however, better captured by 
analysing the degenerate mapping between the ensembles 
of naturally-occurring protein sequences and their corre- 
sponding foldsHSMHl 111 EZl [Ml |S3 [Ml- For instance, 
the current ~85,000 entries can be clustered in about 
20,000 non-redundant sequence sets but cover only 1,500 
distinct structural folds [TI2 1 IT22 ] . 

The introduction of general quantitative schemes for 
comparing, or aligning, protein sequences and protein 
structures has played a crucial role for framing the ob- 
served many-to-one sequence-structure relationship in 
the context of molecular evolution [1171 [139]. In particu- 
lar, by following the impact that evolutionary sequence 
divergence has on native structural changes 28J it has 
been possible to identify general properties of peptide 
chains, amino acid hydrogen-bonding patterns, thermo- 
dynamic stability etc. that govern the sequence-structure 
relationship by constraining the repertoire of viable 
structural changes that are evolutionary accessible [271 (341 
[MllMl [TfKll[T47l[T56l[r70l[T76] . 

As a result, remote evolutionary relationships are more 
confidently obtained from structure-based comparative 
methods than sequence based ones. 

Besides the above general constraints, additional and 
stronger ones are imposed by functional requirements. In 
fact, it has long been known that enzymes that have evo- 
lutionarily diverged and that catalyze different reactions. 



tend to conserve very precisely functional structural el- 
ements and the location of the active site where differ- 
ent amino acids can be recruited for different function ^ITJI 
|M1[IIS1[I21[IM1- More recently it has also emerged that 
specific features of protein internal dynamics that impact 
biological activity and functionality can also be subject 
to evolutionary conservation [HI [571 1133 11511 [T5^ . 

By analogy with the sequence-structure case, one 
may therefore envisage that quantitative methods apt 
for comparing function-oriented properties in different 
proteins could advance the capability of detecting pro- 
tein evolutionary relationships that may be elusive to 
sequence- or structure-based investigations. 

Here we shall review recent studies which focused on 
the comparison of protein internal dynamics, which is ar- 
guably one of the many aspects that often, though not 
always, assist or influence protein function over a wide 
range of time scales [T4 t (37 1 1103] . For example, concerted 
structural movements in enzymes, either "innate" or trig- 
gered by ligand binding, have been argued to be im- 
portant for enzymes to achieve a catalytically-competent 
state, promote catalytic efficiency, for allosteric signal 
propagation and protein-protein interactions [2 [TI] [TH 
[11[31[3I1'38, 52, 55, 59, 60, 71, 87, 101, 105, lOl [TUH 
im Wm [T26. ,132. ,137. ,155> .160. .168. ,179. .181. ,18^ . 

We shall accordingly report on the progress that has 
been made in recent years towards developing and ex- 
ploiting quantitative numerical strategies for comparing 
the internal dynamics of proteins and explore its connec- 
tion with structural and functional similarities. 

The material presented in the review is organised as 
follows. Because these approaches are virtually all based 
on numerical characterizations of protein internal dy- 
namics we shall first provide a self-contained method- 
ological summary of the theoretical/computational tech- 
niques used to characterize and compare protein internal 
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dynamics. Next we shall overview the contexts where 
dynamics-based comparisons, with different resolution 
and scope, have been applied. We shall further pro- 
vide an in depth discussion of a number of selected in- 
stances where dynamics-based similarities have been de- 
tected within structurally- heterogeneous members of spe- 
cific protein families, and even across protein families. 



II. COMPARING PROTEIN INTERNAL 
DYNAMICS: METHODOLOGICAL ASPECTS 

In this section we provide a self-contained overview of 
the quantitative numerical approaches employed to char- 
acterize and compare the internal dynamics of proteins. 
In particular, we first review the essential dyamics anal- 
ysis techniques which are commonly applied to atom- 
istic molecular dynamics simulations or phenomenologi- 
cal coarse-grained models (elastic networks) to single out 
the collective degrees of freedom that best account for 
protein's internal motion in thermal equilibrium. Next 
we shall discuss how the essential dynamical spaces and 
other dynamics-related quantities can be used for com- 
parative purposes. 



A. Protein internal dynamics: essential dynamics 
analysis of MD trajectories 

The wealth of information produced by extensive 
atomistic molecular dynamics (MD) simulations of glob- 
ular proteins is typically described and rationalised by 
identifying the few collective degrees of freedom that 
best capture the internal protein dynamics. Arguably, 
the most commonly used technique is represented by the 
principal component analysis [15] of amino acid pairwise 
displacements. 

This technique relies on the spectral decomposition of 
the matrix of pairwise correlations of the displacements 
of amino acids, represented by their Ca atoms, from their 
reference positions. 

In the following we shall indicate with r j [t) the three- 
dimensional position at simulation time t of the ith Ca 
atom and with (5r(t) = Yi{t) — (r^) the associated vec- 
tor displacement from the average reference position. A 
generic entry of the matrix of pairwise displacement cor- 
relations, C, is accordingly defined as 



= (<5r,- ^(t)<5r,^,(i)) (1) 

where Svi^^it) is the /ith Cartesian component of the vec- 
tor displacement of the ith amino acid and () denotes the 
average over simulation time. For proteins consisting of 
A'^ amino acids, the symmetric covariance matrix C has 
linear size equal to 3A^. 

It is important to notice that the matrix element of 



eq. [T]can be equivalently rewritten as: 

3N 

C„, ^i. = J2^^ ^LA,!^ (2) 
1=1 

where Ai, A2, ... are the eigenvalues of C ranked by de- 
creasing magnitude and v^, v^, ... are the corresponding 
orthonormal eigenvectors. 

Because the protein overall mean square fluctuation is 
given by 

Y,{Sn^,itr)=J2Cu,,,^Y.^' (3) 

one has that top ranking eigenvectors of C embody the 
independent degrees of freedom that most contribute to 
the internal dynamics of the protein. Indeed, for most 
globular proteins of 100-200 amino acids, the top 10 
eigenvectors suffice to capture most of the protein mean 
square fluctuation I48j. For this reason, considerations 
are typically restricted to the linear space spanned by 
the top eigenvectors of C, which is commonly termed the 
essential dynamical space[5]. 

The structural deformations entailed by the essential 
eigenvectors, or essential modes, are typically found to 
embody concerted, collective displacements of protein 
subportions consisting of several amino acids [351 1162) . 
As a matter of fact, the large-scale collective conforma- 
tional changes that many proteins and enzymes need to 
sustain in order to carry out their biological function- 
ality have been shown to lie in the essential dynamical 
spacei3, 33, 40, 104. fmi [Till [1501 [T55l[T5T] . 

These observations provide an a pos<enon justification 
for considering the essential dynamical spaces as provid- 
ing key information into functionally-oriented aspects of 
proteins. 

We conclude by noting that one relevant technical 
point of the essential dynamics analysis regards the defi- 
nition of the reference amino acids positions from which 
the instantaneous displacements Sr are calculated. For 
proteins that have an overall rigid-like character, these 
positions can be obtained by averaging the conformers 
sampled by the MD simulation after optimally super- 
posing them. The structural superposition is necessary 
to remove the overall rotations and translations of the 
molecules. It is important to stress that this step is not 
trivially accomplished when proteins have an appreciable 
internal flexibility character (e.g. due to the presence of 
mobile subdomains) [184) . In this case, to avoid artefac- 
tual results, it is crucial to identify the correct frame of 
reference for describing and computing the internal struc- 
tural fluctuations of the protein, see e.g. the discussion 
of ref. [60) 161) and related supporting material. 

However, it must be noted that the relative displace- 
ments of domains in multidomain proteins can be so large 
that protein movements cannot be reliably described 
by a linear superposition of a limited number of essen- 
tial dynamical spaces, even if obtained with the above- 
mentioned procedure. A prototypic example is offered 
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by the relative rotation of protein domains by a finite 
angle. In this case the directions of instantaneous ro- 
tations of the two extreme positions can project very 
poorly on the difference vector of the latter (see Fig. 
3 in ref. |151j ). In such cases the salient degrees of 
freedom of protein internal dynamics can be identified 
by decomposing the protein of interest into quasi-rigid 
domains 2, 16, 54, 56, 66, 7^ 11311 1175) and next consid- 
ering their relative roto-translations jl08j , see also section 



B. Essential dynamical spaces from elastic network 
models 

The collective character of the top eigenvectors of the 
covariance matrix C obtained from atomistic MD simula- 
tions suggests that the essential dynamical spaces could 
be reliably identified by coarse-grained protein models. 

This observation, which was stimulated by the sem- 
inal work of M. Tirion '162^ has in fact lead to the 
introduction of the well-known elastic network mod- 
els which, despite adopting a simplified description of 
a protein's structure and its native amino acid inter- 
actions, can reliably identify the essential dynamical 
spaces of globular proteins with a negligble computa- 
tional expenditure 13 H [Ml |S3 |M1 IMl 1113 • 

In these approaches, each amino acid is described by 
one or few centroids (e.g. the Cq, atom for the main chain 
[7] and an additional centroid for the side chain |101"] ') 
the model potential energy is constructed by introducing 
quadratic penalties for the deviations from the native 
values of the distance of all pairs of centroids that are 
in contact in the native state. Accordingly, for a pro- 
tein consisting of N amino acids, the resulting potential 
energy has the form: 

^ fri.M Mij^^^Jrj^^. (4) 

where M is a symmetric matrix of linear size 3A'^. In the 
following we shall indicate with tq, ti,... t^n the eigen- 
values of M ranked for increasing magnitude, and with 
w", w^,... w^^ the associated orthonormal eigenvectors. 
The eigenvalues {r/} are all positive except for the six 
attributed to the global rotations and translations of the 
molecule. It is evident that eq. [i] bears strong analogies 
with the normal mode analysis of proteins [5^ I162j . 

Because of the quadratic character of the model poten- 
tial energy of eq. Q canonical equilibrium properties of 
the elastic network can be calculated exactly. In partic- 
ular, a generic entry of the model covariance matrix C is 
given by 

Ct],t,u = hbT M^j^^^ (5) 

where hbT is the thermal energy at the temperature 
of interest, T, and the tilde superscript denotes the 



pseudoinversion operation i.e. the removal of the zero- 
eigenvalue space prior to the inversion of M. Equiva- 
lently, C can be written as 

Cij,^u = (6) 
I 

where the prime indicates the omission of the eigenspaces 
associated to the zero eigenvalues. The above expression 
clarifies that the degrees of freedom that most account 
for the proteins' fluctuations in thermal equilibrium cor- 
respond to the modes of protein deformation associated 
to the smallest eigenvalues, i.e. those that cost least en- 
ergy to excite. 

If the proteins dynamics were described by an over- 
damped Langevin scheme, these low-energy modes would 
also be those having the slowest relaxation time. Al- 
though the harmonic character of the near-native free 
energy well and the white noise Langevin description ap- 
ply only limitedly to proteins pi El Ell IMl Il03l [T27] . 
the observation is qualitatively consistent with the fact 
that collective low-energy modes in proteins occur over 
long time scales (and hence are occasionally referred to 
as "low- frequency" modes) . These observations motivate 
the practice, adopted in this review too, of regarding the 
principal components of equilbrium structural fluctua- 
tions as embodying the salient internal dynamical prop- 
erties. 

We conclude by mentioning that in recent years alter- 
native formulations of elastic network models have been 
proposed including versions based on the matching of ob- 
servables obtained from atomistic MD simulations [TT5] 
and on the use of internal coordinates, which are com- 
monly used in normal mode analysis too[53l |89l |90l |92l 



C. Anharmonicity of proteins free energy 
landscape 

The viability of elastic network models to capture the 
salient traits of protein conformational fluctuations is 
justified a posteriori by the good accord between the 
essential covariance matrices of elastic network models 
and of extensive atomistic MD simulations. For exam- 
ple in ref. |101j it was compared the covariance matrices 
of HIV-l protease with a bound ligand obtained from a 
14-ns MD simulation with an atomistic force-field and 
explicit solvent and the beta-Gaussian elastic network 
model, which employs two centroids per amino acids (for 
main- and side-chain, respectively). The linear corre- 
lation coefficient of the ~20,000 corresponding distinct 
entries of the two matrices was significant (equal to 0.8) 
like the consistency of the two sets of essential dynami- 
cal spaces. A more recent example of the good accord of 
protein structural fluctuations computed with elastic net- 
work models and MD atomistic simulations is provided 
by the work of Romo and Grossfield on GPCRs mem- 
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brane proteins |142) . This study showed that a suitably- 
parametrized model can match the essential dynamical 
spaces and their relative weight observed in microsecond- 
long simulations |142| . 

This agreement is noteworthy in consideration of the 
highly complex free energy landscape explored by folded 
proteins can explore in thermal equilibrium. In fact, 
this landscape presents several tiers of local minima 
[45l l46l I171j with low barriers (compared to the ther- 
mal energy kbT) separating conformational states with 
local structural differences such as the rotameric state 
of a sidechain while large ones separate conformational 
ensembles with major subdomains rearrangement, such 
as for open and closed conformation of certain enzymes. 
In turn, the hierarchical organization of these minima 
reflects in a broad range of time-scales, from the ps to 
the ms and beyond, over which the mentioned struc- 
tural changes can occur as observed in NMR and single- 
molecule experiments [H [m EOl [Ml [T75] 

From these general considerations and from the de- 
tailed analysis of the protein conformational substates 
visited over MD trajectories of hundreds of nS jlSFl 1138] 
it emerges that the harmonic approximation on which 
elastic network models rely may be a highly simplified 
parametrization of even the near-native free energy land- 
scape. 

While this limitation, that may be more or less se- 
vere depending on the molecule rigidity, must be clearly 
be borne in mind, it should be noted that the free-energy 
landscape of a few proteins has been shown to be endowed 
with particular properties that make the harmonic, or 
quasi-harmonic |57[ 1651 1731 185] free-energy approxima- 
tion still informative even when dealing with major and 
slow conformational changes. Specifically, computational 
studies of lysozyme T5], protein G[127' and adenylate 
kinase [128] has clarified that the principal directions 
of the free energy minima associated to the substates 
populated by each of these proteins are very consistent 
with each other and also very similar to the difference 
vectors connecting the substates themselves. This indi- 
cates that, despite their structural differences, different 
substates of the same protein tend to have very similar 
modes of conformational fluctuations and that the latter, 
in turn, predispose the observed conformational changes 
between substates. Indeed, by analyzing and comparing 
the covariance matrices of longer and longer MD tra- 
jectories of protein G [ 127j , it was seen that while the 
trace of the matrix tended to increase (due to the breadth 
of visited conformational space), the consistency of the 
essential spaces remained highly significant. Analogous 
conclusions were drawn more recently by Liu et al. who 
compared the consistency of essential dynamical spaces 
of cyanovirin-N obtained from atomistic simulations of 
varying duration |86|. 

From these results it emerges that the essential dy- 
namical spaces calculated from a relatively short MD 
simulation or from an elastic network model, would still 
bear information on the conformational fluctuations sus- 



tained by the proteins over time-scales where the har- 
monic approximation is invalid. The fact that these 
considerations might hold more in general and not only 
for the proteins investigated in refs.[7Sl |HS1 11271 [128] is 
reinforced by the fact that the difference vector bridg- 
ing pairs of different protein conformers (such as open 
and closed forms of several enzymes) has been shown 
to overlap significantly with the essential dynamical 
spaces calculated from elastic network models for either 
conformerO ESI [Ml [Ml HM] . 



D. Essential dynamical spaces of protein 
sub-portions 

For the purpose of comparing the essential dynamical 
spaces of proteins with different length and/or architec- 
ture it is necessary to identify the essential dynamical 
spaces of specific protein subparts. 

This is straightforward to do in the context of atom- 
istic molecular dynamics simulations. In fact, one simply 
needs to restrict considerations to the amino acids of in- 
terest when calculating the average reference structure 
and the covariance matrix. The top eigenvectors of this 
"reduced" covariance matrix (whose entries are clearly 
not equal to the corresponding ones in the matrix com- 
puted for the full protein) accordingly provide the gener- 
alised degrees of freedom that best capture the internal 
motion of the amino acid of interest. 

A different approach is however needed for elastic net- 
work models. In this case, the reduced covariance ma- 
trix of the amino acids of interest must be obtained by 
the thermodynamic integration of the degrees of freedom 
of the remainder amino acids. For completeness of no- 
tation we assume that the TV protein amino acids have 
been grouped in two sets, a and h. Set a gathers all the 
n amino acids of interest. The interaction matrix M, af- 
ter the row/columns reordering following the amino acid 
groupings, can be partitioned in blocks as follows: 



M = 



Ma 


V 




Mb 



where the submatrices Ma and Mb capture the elastic 
network interactions involving pairs of amino acids in set 
a and 6, respectively, matrix V contains the elastic net- 
work couplings of amino acids in the two sets and T de- 
notes the transpose. Matrices Ma and Mf, are square and 
symmetric (of linear size in and 'i{N — n), respectively) 
while matrix V is, in general, rectangular. 

Because of the quadratic character of the energy func- 
tion U it is possible to calculate exactly the reduced ma- 
trix effective interactions for amino acids in set a which 
is equal to: 

Mf = Ma-V M-^ (8) 
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and finally, the covariance matrix of set a is obtained 
by taking the pseudoinverse of [HI [65l I104j . It is 

important to point out that the second term in the right- 
hand-side of the above equation allows for taking into 
account the influence of the remaining amino acids from 
those of interest. This term is also crucial to ensure that 
the dynamics of amino acids in set a is described in the 
proper reference system where the roto-translations of set 
a alone (and not the whole protein) are extracted. 

We conclude by mentioning that, in the same spirit 
of eq. Isl one can obtain effective interaction (and co- 
variance) matrices for few generalised degrees of free- 
dom that depend linearly on amino acid Cartesian co- 
ordinates. One such example is offered by the study of 
ref. [17 where the structural fluctuations of a large set 
of EF-hand proteins was studied in terms of the relative 
motion of the axes of their four helices. 

A further relevant avenue where the degrees-of-freedom 
integration can be profitably applied is represented by 
proteins embedded in a constraining matrix. A notable 
instance is represented by membrane proteins whose 
conformational plasticity can have important functional 
implications |391 I145j . For such proteins, Romo and 
Grossfield jl42j have recently shown that eq. [s] can be 
generalised and used to define effective inter-amino acid 
interactions which taken into account the influence of em- 
bedding bilayer. 



where v' and denote the Ith essential mode of the 
marked amino acids in protein A and B, respectively, and 
we have further assumed that matching amino acids carry 
the same index, i = l...n, in the two proteins. Because 
of the orthonormality of each of the two basis sets {v}'s 
and {w}'s, the RMSIP takes on values in the 0-1 range. 

The RMSIP measure was introduced for the purpose of 
assessing the convergence of an MD simulation by com- 
paring the essential dynamical spaces of e.g. the first and 
second half of the trajectory [4]. Although a simple quan- 
titative criterion for its statistical significance is lacking, 
it is generally held that RMSIP values larger than 0.7 
imply meaningful dynamical similarities |62j. For com- 
pleteness we mention that other measures of dynamical 
similarity and MD simulation convergence are available, 
see e.g. refs. [151 ITfl [TMl [Ti5l [Hi] . 

We finally point out that, for the purpose of profiling 
the contribution of individual amino acids to the overall 
mean square inner product one can consider the quantity, 
which is invariant for changes of the basis of the essential 
dynamical spaces [18]: 



1 ^° 

"-si: 



l.m—1 



(11) 



where i is the index of the amino acid of interest, or its 
square root qi = \fQi. 



E. Measures of similarities of two sets of essential 
dynamical spaces 

The information about protein internal dynamics that 
can be gleaned by applying the methods described in the 
previous section, can be used in quantitative approaches 
for the dynamics-based comparison, or alignment of pro- 
teins. 

We start by discussing the case where the two proteins 
of interest, A and B, are so similar that sequence or struc- 
tural alignments suffice to establish extensive one-to-one 
correspondences between all of their amino acids or a 
subset of them. 

The consistency of the dynamics of the two sets of 
amino acids marked for alignment can be assessed by 
the standard root mean square inner product (RMSIP) 
of their esential dynamical spaces. Customarily, the 
comparison is restricted to the top 10 essential modes, 
which are usually sufficient to cover most of the global 
mean square fluctuation of a protein observed in MD 
simulations[i5]. Accordingly, the RMSIP is defined as: 



F. Best-matching essential dynamical spaces. 



The RMSIP of eqn. ( 10 ) measures the overall consis- 



tency of the essential dynamical spaces and therefore is 
invariant upon change of the basis vectors for the two 
linear spaces, {v}'s and {w} 

This property can be exploited to replace the {v}'s 
and {w}'s with two new sets of orthonormal vectors 
v^, v^, ...v^° and w^, w^, ...w^" which are ranked for de- 
creasing mutual consistency (magnitude of the scalar 
product) [128]. 

To do so, one constructs the 10x10 asymmetric matrix 
D whose entries are Dij = • v^. Next one solves the 



(12) 
(13) 



eigenvalue problems |T28]: 



Assuming that the eigenvalues have been ranked by de- 
creasing order /ii > ^2 > •■■Mio one has that the new 
basis vectors are given by 



RMSIP = 



\ 



\ 



1 ^" 
10 ^ 



/,m— 1 Li— 1 M 



10 

10 ^ ' 



(9) 
(10) 



/,m— 1 



10 



10 



E^} 



(14) 

(15) 
(16) 
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The newly defined orthonormal basis, {v} and {w} 
have the following remarkable properties: 

• the ith vector in one set is orthogonal to all vectors 
in the other set with index different from i, i.e. • 

= if z 7^ j; 

• the scalar products • have magnitude that 
decreases with i 

therefore the new basis vectors are optimally ranked for 
decreasing mutual consistency and are ideally suited to 
represent the most consistent (or inconsistent) subspaces 



inner product, 



spanned by the {v}'s and {w} |128j 

Once more we stress that, as the {v} and {w} provide 
alternative basis for the same spaces spanned by the {v}'s 
and {w}, the RMSIP of {v} and {w} is the same as for 
the {v}'s and {w}. 



G. Beyond structural alignment: dynamics-based 
protein alignment 

1. Aligning proteins by matching their essential dynamical 
spaces 

The previous approach needs to be suitably generalised 
in contexts where one wishes to detect dynamics-based 
correspondences in different proteins without relying on 
their prior sequence or structure alignment. 

A prototypical situation is illustrated in Fig. [l] where 
two cartoon structures with different shape are sketched 
in panels a) and b). Despite the overall shape difference, 
the structural deformation modes described by the ar- 
rows, are well-consistent and can provide the basis for 
aligning the two structures, see panel c). 

As first noted by Zen et al. \177\ . the example in Fig.[l] 
clarifies that meaningful dynamics-based alignments can- 
not be simply obtained by purely rewarding the similarity 
of directionality and magnitude of the essential dynami- 
cal spaces of any of two sets of amino acids in the proteins 
of interest. In fact, the alignment shown in panel c) is 
intuitively perceived as viable because the origins of the 
paired arrows, A-A' and B-B', are nearby in space. If 
the origins had been arbitrarily dislocated in space, then 
the paired arrows would not have implied any consistent 
structural modulations of the two shapes (but motions of 
very large amplitude can significantly change the geomet- 
rical relationships of dynamically-corresponding regions, 
see section III I). 

Prompted by the above considerations, Zen et al. |177j 
introduced and applied a dynamics-based alignment 



scheme which simultaneously rewarded the consistency of 
the essential dynamical spaces of matching amino acids as 
well as their spatial proximity. Specifically, in this align- 
ment technique the score to be maximised over the possi- 
ble sets of corresponding amino acids pairs was based on 
distance- weighted generalization of the root mean square 



1 

10 



10 

E 



EE< 

,'i=i fj. 



(17) 

where i — runs over the n aligned amino acids, 

di is the distance between the ith (matching) amino 
acids in proteins A and B after an optimal superpo- 
sition over the putative matching region, and f{d) — 
[1 — tgh((c?— dc) /A)]/2 is a sigmoidal distance weight- 
ing factor where dc = 4 A and A = 2 A. 



Notice that, as for the RMSIP, the measure (17) is 



independent of the choice of the bases spanning the linear 
space of the top 10 essential dynamical modes. 

The sought dynamics-based alignment is accordingly 
obtained by maximizing the measure of eq. ^7^ (after a 
suitable n-dependent regularization, see ref. |177] ) over 
the space of possible amino acid pairings in the two pro- 
teins, and finally by assessing its statistical significance 
by comparing it against a null reference case. 

Clearly, the combinatorial space of matching amino 
acids is very large and, because each attempted align- 
ment involves the re-calculation of the essential dynami- 
cal spaces, the computational effort entailed by this com- 
parison is significant and can take several minutes on 
present-day computers for two proteins of ^ 100 — 200 
amino acids. 

By heuristically restricting the search of matching 
amino acids and by using approximate but faster cal- 
culations of the alignment score, the original algorithm 
of Zen et al. [177j was sped up sufficiently for interactive 
use via the Aladyn web-server |130| . The results of this 
publicly-available server will be frequently referred to in 
the remainder of this article. 



2. Aligning proteins by matching pairwise distance 
fluctuations 

An alternative method to align proteins based on their 
internal dynamics properties was recently proposed by 
Biggin and coworkers [ITT]. In this method one ex- 
clusively considers the pairwise distance fiuctuations of 
amino acids, with no explicit reference to the spatial co- 
ordinates of the latter, nor to the detailed information 
contained in the top essential dynamical spaces. This 
scheme is based on the idea that, if a set of amino acids 
{a} in protein A has similar movements to a correspond- 
ing set of amino acids {/?} in protein B then the matrices 
of pairwise distance fluctuations of the two sets, and 
Fjs, should be similar too. 

In the approach of Miinz et al. [lllj a generic entry of 
the F matrix is defined as 



Fa{i,j) = std.dev{da,,ai) 



(18) 



where the right-hand-side is the standard deviation of the 
distance of amino acids i and j in set a calculated over 
a converged molecular dynamics trajectory. 
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FIG. 1: Example of dynamics-based alignment. The two cartoon structures in panels (a) and (b) have dissimilar shapes. Yet, 
their internal movements, schematically indicated by the arrows, are consistent and can provide valuable clues for superposing 
the two structures, as shown in panel (c). 



Next, one calculates the relative difference of each cor- 
responding matrix entry. 



\F^{i.3)-F0{i,j)\ 
{F^{t,j)+Fp{i,j))/2 



(19) 



and an overall dynamical score S^^ {a, (3) is constructed 
by weighting the contribution of all d{i,jys. 

As in the previous approach, the best dynamics-based 
alignment of the two proteins is found by maximising 
S^^{a,/3) (again after a suitable length-regularization 
procedure) over all possible choices of {a} and {/?}. In 
the study of ref. the exploration of the vast com- 

binatorial space of a.a. pairings was carried out within a 
Monte Carlo optimization scheme. 



of view. In such contexts, in fact, the assessment and in- 
terpretation of the comparisons is more straightforward. 
Accordingly, we shall first discuss these comparative in- 
vestigations of proteins whose relatedness is known a 
priori. We shall next report on studies which consid- 
ered proteins with limited structural relatedness as well 
as investigations targeted at understanding more general 
(and possibly evolutionary) dynamics-based aspects of 
the structure/function relationship. When appropriate, 
the results of these earlier studies will be revisited us- 
ing the dynamics-based alignment of rcf. [177 as imple- 
mented in the publicly available Aladyn web-server [75]. 



Common fluctuation patterns in proteins with a 
Rossmann-like fold 



3. Aligning proteins by matching the mean square 
fluctuation profiles 

The possibility to align proteins by detecting corre- 
spondences in the amplitudes of amino acids motions 
in different proteins was first explored by Keskin et al. 
[75] . In this study, which is covered in section III A the 



one-to-one correspondences of amino acids in a set of 
structurally-related proteins was based on a supervised 
matching of the amplitude of amino acid fluctuations 
computed from an isotropic elastic network model[8 . 

An automatic implementation of this alignment strat- 
egy was recently introduced by Tobi ref. |163) . In 
this study, the one-dimensional character of the quan- 
tity to be matched (mean square fluctuation) was ex- 
ploited, as in sequence alignments, within the dynamical- 
programming alignment of Needleman and Wunsch |114j . 



III. COMPARATIVE STUDIES OF PROTEIN 
INTERNAL DYNAMICS 

Early systematic dynamics-based comparisons were all 
targeted to groups of proteins known to be significantly 
related from the sequence, structural or functional point 



We first discuss the case of proteins adopting a 
Rossmann-like fold which were addresses in the studies 
of Keskin et al. [75] and Pang et al. |120j . 

In the study of ref. [75], which is arguably the first 
dynamics-based comparative investigation, Keskin et al. 
considered six proteins each consisting of two linked 
globular domains with a Rossmann-like fold. The pro- 
teins covered two homologous groups: the first one 
(CATHP^I code 3.40.190.10) included cofactor binding 
fragment of CysB, the lysine/ arginine/ornithine-binding 
protein (LAO), the enzyme porphobilinogen deami- 
nase (PBGD), the N-terminal lobe of ovotransferrin 
(OVOT) while the second one (CATH code 3.40.50.2300) 
comprised the ribose-binding protein (RBP) and the 
leucine/isoleucine/valine-binding protein (LIVBP). 

The internal dynamics of these proteins was charac- 
terised by using a simplified (isotropic) Gaussian network 
model [8] to compute their mean-square fiuctuation pro- 
files and the lowest energy modes. The authors observed 
that the latter mostly entailed a hinge-bending motion 
of the two domains around the linker and the predicted 
motion amplitude varied significantly between the unli- 
ganded and liganded state of the molecules. In connec- 
tion to this latter result it is worth noting that for sev- 
eral other proteins it has been shown that the internal 
dynamics sensitively depends on substrates and cofac- 
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tors. A prototypical example is offered by dihydrofolate 
reductase where dynamical properties, arguably linked to 
catalysis, has been shown numerically to strongly depend 
on the type of bound ligand[T3S]. 

The similarities of the modes amplitude profiles across 
the six proteins, further prompted Keskin et al. to at- 
tempt a manually-curated alignment of the proteins by 
matching the modes shape in a gapless portion of one of 
the two domains. The amino acid correspondences were 
next extended to the remainder of the proteins by in- 
specting both their FSSP structural alignments |68| and, 
again, the modes shape. These supervised alignments re- 
turned very good superpositions of the modes amplitude 
profiles across the considered proteins and, because of the 
limited use of structural correspondences, the RMSD af- 
ter an optimal superposition of the corresponding amino 
acids was about 7A. 

From the consistency of the modes' profiles the authors 
concluded that members of the same fold can share com- 
mon dynamical features on a global, collective scale and 
further envisaged that fully-automated dynamics-based 
alignments of proteins might have been feasible. 

The implications of structural relatedness for the simi- 
larity of protein internal dynamics were next explored by 
Pang et al. \\2Q\ by using atomistic molecular dynamics 
simulations on a set of four periplasmic binding proteins 
in various forms: apo, holo and crystallized in different 
conditions. 

The monomeric units of these entries, which included 
the LAO protein considered by Keskin et o/.[7S|, com- 
prised about 230 amino acids and consisted, again, of two 
Rossmann-like domains connected by a linker. Based on 
DALI[67. alignments Pang et al. identified a core of 100 
amino acids (i.e. spanning about 40-45% of the proteins) 
common to the four proteins. 

The comparison of the internal dynamics was carried 
out on the common core amino acids and regarded vari- 
ous quantities calculated from 10- or 20- ns long molecular 
dynamics simulations. In particular, the comparison in- 
cluded: the amino acids' mean square fluctuations, the 
overlap of the covariance matrices and the overlap (RM- 
SIP) of the two essential dynamical spaces. 

By comparing the properties of the same protein but 
in liganded and unliganded forms. Pang et al. observed 
clear differences in the molecules' internal dynamics, con- 
sistently with the findings of Keskin et al. reported 
above. 

Regarding the comparison of different proteins, the 
authors reported a significant overlap of all dynamical 
properties computed over the common core. In particu- 
lar, throughout the set of periplasmatic binding proteins, 
the first and second essential dynamical modes system- 
atically corresponded to, respectively, the hinge-bending 
and twisting motions of the linked domains. 

However, by examining how the overlap of the co- 
variance matrix and essential dynamical spaces increased 
with simulation time, the authors observed that each pro- 
tein tended to occupy specific regions of the essential dy- 



tag 


protein 


PDBid 


CATH code 


A 


endothiapepsin (ASP) 


lerSE 


2.40.70.10 


B 


HIV-1 protease (ASP) 


InhOAB 


2.40.70.10 


C 


3C-like proteinase (SER) 


luk4A 


2.40.10.10 








1.10.1840.10 


Di 


adenain (CYS) 


lavp 


3.40.395.10 


Da 


sedolisin (SER) 


lga6 


3.40.50.200 


D3 


pyroglutamyl peptidase I (CYS) 


lioi 


3.40.630.20 


E 


assemblin (SER) 


ljq7A 


3.20.16.10 


Fi 


dipeptidyl-peptidase I (CYS) 


IkSbA 


2.40.128.80 


Fa 


cruzipain 


lme4 


3.90.70.10 


Gi 


atrolysin E (Zn) 


Ikuf 


3.40.390.10 


G2 


carboxy peptidase Al (Zn), 


8cpa 


3.40.630.10 



TABLE I: Representatives of the seven common protease 
folds, A-G. The list includes proteases with different catalytic 
chemistry (aspartic-, serine-, cystein- and metallo-proteases) . 
For convenience of comparative purposes, because the ac- 
tive site is comprised within the monomeric units of 3C-like 
proteinase, assemblin and dipeptidyl-peptidase I, we did not 
consider the multimeric biological form of these entries. Con- 
versely, because the catalytic aspartic dyad of HIV-1 protease 
straddles the dimeric interface, we retained its full dimer. The 
corresponding structures are represented in Fig. [2] 

namical space. It was concluded that these differences 
reflected protein-specific features, arguably encoded in 
their sequence. While, it cannot be ruled out a priori 
that the the observed differences could be ascribable to 
the several non-aligned amino acids, the observation of 
Pang et al. is very interesting and relevant in the present 
context, because it points to specific dynamics-based fea- 
tures which can be beyond reach of sequence- independent 
approaches, such as elastic network models. 



B. Dynamics- based alignment of proteases 

Proteases, enzymes that cleave peptide chains, account 
for about 2% of the genome of various organisms [133 
11401 1153j . In view of this representative weight and bio- 
logical importance, they have been systematically inves- 
tigated and compared. 

The comprehensive survey carried out by Tyndall et 
al. jl65j , identified 7 common structural folds for this fam- 
ily of enzymes. Various representatives for the seven com- 
mon folds were identified by Carnevale et aL|19| and are 
listed in Table [l] and shown in Fig. [2j 

As reported in Table |lj the various representatives 
cover 4 different architectures and 9 different topolo- 
gies of the CATH classification scheme |122j. Notice that 
the two aspartic proteases, the endothiapepsin and HIV- 
1 PR share the full CATH code, implying that they 
have detectable sequence homology despite their their 
marginal sequence identity, different length and differ- 
ent oligomeric state (monomeric for edothiapspsin and 
dimeric for HIV-1 PR ) [22l [T59] . 

Besides this ASP-protease pair, other pairs of entries 
listed in Table |l] have significant overall structural sim- 






FIG. 2: Representative structures of the common protease folds listed in Table |l] This illustration and subsequent ones were 
prepared with the VMD graphical package [70]. 



ilarities. In particular the six possible distinct pairings 
between pyroglutamyl peptidase I, atrolysin E, sedolisin 
and carboxy peptidase Al are all significant according 
to the DALI statistical criteria|67|. Interestingly, the si- 
multaneous multiple alignment of these four entries is 
poor and involves several short fragments for a total of 
about 30 amino acids (consistently for both Mistral and 
Multiprot[102l[l48]). 

The top structural alignments within this group in- 
volved the entry pyroglutamyl peptidase I and are shown 
in Fig. |3] As it was reported in ref. [H] (see Fig. 
3 therein) the alignments, involve several disconnected 
matching fragments comprising the active site and the 
surrounding region within 7-10 A of it. 

The good structural superposition of the active sites 
in panels (a) and (b) of the Figure provides evidence for 
the existence of functionally-related traits that are shared 
by proteases that are non-homologous and rely on dif- 
ferent catalytic chemistry (serine, cysteine- and metallo- 
proteases). 

The fact that functional activity of various proteases 
is known to be impacted by their large-scale internal 
dynamics [131 I124H126| . which can involve mechanical 
couplings between the active site and distal regions at 
the protein surface jlOll I124H126] , poses the question of 
whether dynamics-based alignments can be used to iden- 
tify further relationships between proteases that are elu- 
sive to the pure structural comparison. The possibility to 
do so is illustrated in Fig. [4j which illustrates the dynam- 
ics based alignment of HIV-l PR and endothiapepsin. 

Following the spirit of ref. ^9\, we have used the Ala- 
dyn algorithm to align all pairs of entries in Table |lj In 
addition to the previously mentioned significant struc- 
tural pairings, the Aladyn algorithm identifies 8 addi- 



tional significant alignments (p- value < 0.02, correspond- 
ing to the incidence of less than one false positive in the 
set of all pairwise alignments of the entries in Table m. 
These pairs are shown in Fig. [5] Notice that calpain, 
adenain, atrolysin E, and HIV-IPR (corresponding re- 
spectively to tags F2, Di, Gi and B in Table |l]) con- 
stitute a notable dynamically-alignable "clique" because 
all pairings of these proteins (with the sole exception of 
cruzipain~HIV-lPR which involves only 30 amino acids) 
are significant. 

The structural and dynamical consistency of the 8 
aligned pairs is shown in Fig. [5] It is striking to see 
that the active sites of the compared proteins are very 
well superposed or in contact, with the exception of two 
alignments, assemblin-HIV-l PR (E-B) and assemblin- 
atrolysin E (E-Gi) where the active sites are at a dis- 
tance of lOA. The overall RMSD of the matching amino 
acids is ^ 3.0A. 

It is also noticed that the corresponding modes, tend 
to outline a shearing deformation of region surounding 
the active site. This result is in accord with the gen- 
eral functional features common to proteases, which con- 
sists of the shearing of the bound peptide into a beta 
extended conformation prior to cleavage |165| . More gen- 
erally, the finding is consistent with the observed prop- 
erty that active sites in enzymes tend to be located at 
the interface of quasi-rigid domains, as this can ensure 
a fairly rigid geometry of the catalytic region located at 
the interface combined with an appreciable modulation of 
the surrounding region which ought to aid the substrate 
recognition and processing |131l 1146] . 

For the specific case of proteases, the dependence of 
the enzymatic activity and catalytic rate on the global 
conformational fluctuations of the proteins has been ad- 
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FIG. 3: Stuctural alignments of pyroglutamyl peptidase I with a) sedolisin, b) carboxy peptidase Al and c) atrolysin E. In all 
panels the pyroglutamyl peptidase I is shown in red, while the partner proteins are shown in blue. Aligned regions are shown 
with thick ribbons and known active sites |129] are highlighted with Van-der-Waals surfaces. The trace of non-aligned regions 
is shown as a thin grey curve. 




FIG. 4: (a)Dynamics-based alignment of HIV-1 protease (red) and endothiapespsin (blue) obtained with the Aladyn web- 
server [T30]. A thick ribbon is used to highlight aligned regions and known active sites are highlighted with Van-der-Waals 
surfaces. The trace of non-aligned regions is shown as a thin grey curve while the arrows represent the three best matching 
essential modes. The ribbons and the modes are shown separately in panels (b) and (c), respectively. 



vocated for HIV-1 PR[126 (but this does not occur for 
furin, a serine protease ,20J). The proposed mechanism 
for HIV-1 PR has been corroborated by recently exper- 
imental findings |30j. Further examples of the coupling 
between the modulation of the geometry of the region 
near the active site and the global protein motions are 
provided by triose phosphate isomerase|80j and dihydro- 
folate reductase [Tl I141j 

We emphasize that all the pairings identified with 
the dynamics based alignment shown in Fig. [5] are not 
deemed significant in DALI alignments. The findings 
therefore suggest that, for certains proteins and enzymes, 
some functionally-oriented features can be more confi- 
dently identified using dynamics-based alignments than 
with sequence- or stucture-based alignment approaches. 



C. Dynamics-based alignment of PDZ domains 

We next discuss the dynamical similarities of mem- 
bers of the PDZ domain family. PDZ domains are struc- 
tural moduli commonly associated to ion channels and 



receptors or otherwise involved in signal transduction 
pathways [U HTUl Ell . 

They are typically 80-100 amino acids long and adopt 
an overall globular fold comprising two a helices and 6 
/3 strands, see Fig. [6]d. The interaction with a partner 
protein usually occurs through the accommodation of its 
C-terminal segment in the f32-a2 cleft. In fact, the ob- 
served mobility of helix a2 relative to the PDZ-domain 
core has been argued to be important for ligand binding 
and recognition [3TJ [771 UH) - Although PDZ-doniains 
sustain modest structural changes after ligand binding, 
see panels a and b in Fig. |6j experimental and numeri- 
cal evidence suggest that there exist allosteric pathways 
running internally to the molecule that signal the bind- 
ing event to regions that are opposite on the protein sur- 
face respect to the binding cleft [SI [771 [HI [88] . While 
key aspects of the signal propagation mechanism are still 
controversial |l5] various evolutionary aspects of the al- 
losteric mechanism and the binding mode have been ac- 
tively investigated using a variety of techniques including 
bioinformatics |5S], NMR [5T], elastic network linear re- 
sponse theory and molecular dynamics simulations 
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FIG. 5: Significant dynamics-based alignments of various pairs of proteases. The pairs are tagged as in Table [I] For each pair 
we report separately the structural superposition of the aligned regions (ribbons) and of the top three best-matching modes 
(arrows). Aligned elements are shown in blue for the first entry of the pair and in red for the second. The active sites are 
shown in cyan and pink for the first and second entry of the pair, respectively. 
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Compound 


CATH domain 


CATH code 


Postsynaptic density protein 95 (PSD-95) 
nitric-oxide synthase (nNOS) 
Alpha- 1 syntrophin 

Inactivation-no-after-potential D protein (Inad) 

Segment polarity protein dishevelled homolog DVL-2 (DVL2) 

Glutamate receptor interacting protein 2 (GRIP2) 


lbe9A00 

IqauAOO 

IqavAOO 

lihjAOO 

2f0aA00 

IxSrAOO 


2.30.42.10 
2.30.42.10 
1.14.13.39 
2.30.42.10 


Tricorn protease 

Type II secretion system protein C 
hypothetical serine protease rv0983 
Photosystem II Dl protease 


lk32A04 
2i6vA00 
lyStAOS 
lfc6A02 


3.4.21 
2.30.42.10 



TABLE II: List of PDZ domains considered in ref. [111| . The line serapates PDZ domain from multicellular organisms (above 
line) from unicellular ones (below line). 




FIG. 6: (a) Apo and (b) holo forms of a PDZ domain. The PDBid of the shown entries is Ibfe for the apo form and lbe9 for 
the holo one. The ligand bound to the Q2-/32 cleft of the holo form is highlighted in orange. 



In particular, Biggin and coworkers |111| have recently 
introduced and systematically applied the dynamics- 



based alignment outlined in section II G 2 to compare 
the mainchain dynamics of 10 PDZ domains from both 
unicellular and multicellular organisms, see Table [III The 
dynamics-based comparison, was based on the analysis 
of pairwise distance fluctuations of amino acids calcu- 
lated from 20ns-long atomistic molecular dynamics sim- 
ulations. 

Within this set of sequence- and structurally-related 
PDZ domains Munz et al. observed the largest dynami- 
cal consistency among the domains from multicellular or- 
ganisms. In fact, significant dynamics-based similarities 
were found almost exclusively among entries from mul- 
ticellular organisms (particularly pairs nNOS-PSD95, 
nNOS-alpha-1 syntrophin, nN0S-DVL2, Inad-Alpha-1 
syntrophin, Inad-DVL2, DVL2-Alpha-1 syntrophin). 

One such pair, PSD95 and nNOS, was analysed in- 
depth to highlight the differences of sequence, structure 
and dynamics-based alignments. Through this compar- 



ative investigation, the authors noticed that dynamical 
correspondences were particularly poor in the 0^2 region, 
which is otherwise structurally well-alignable. Because 
the mobility of this helix arguably impacts the binding 
of ligands it was concluded that the dynamical differ- 
ences could reflect subtle differences in the functionality 
of PSD95 and nNOS [TTT] . 

The findings of Munz et al, are illustrated and revis- 
ited here through the dynamics-based alignment method 
of Zen et al. as implemented in the Aladyn web-server. 
The Aladyn alignment of PSD95 and nNOS is shown 
in Fig. [7] and illustrates the good consistency of the es- 
sential dynamical spaces of the aligned regions. Inter- 
estingly, the contribution of the various corresponding 
amino acids to the good RMSIP value, which is equal to 
0.74, is rather uneven. 

This is illustrated in Fig. [TJj which portrays the 
residue-wise contribution to the mean square inner prod- 
uct, Qi (see eq. Ill along with the mean-square residue 
fluctuations. It is seen that the Q profile is peaked in cor- 
respondence of the loops Li, L2, L3 and L4 which are also 
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FIG. 7: Dynamics-based alignment |130j of the two PDZ domains discussed in ref. The alignment was obtained with 

the obtained with the Aladyn web-server [130] and consists of an uninterrupted stretch 87 amino acids (ARG309-GLU395 for 
Ibfe and ASN14-GLU101 for Iqau) at an RMSD of 2.2A and with an RMSIP of 0.74. The structural superposition is shown in 
panel (a) and the top three matching modes are shown in panel (b). Corresponding elements for entry Ibfe are shown in red 
while those for ent ry Iqau are shown in blue. The crystallographic B-factors and the local essential dynamics space overlap, 
1 = n/Q) (see eqn. |ll|| of Ibfe are shown respectively with a dashed and a solid line in panel c. 



associated to peaks of the crystallographic B-factor pro- 
files. Although the comparison of computed mean-square 
fluctuations with B-factors is not perfectly transparent 
(the latter are affected by crystal packing and disorder 
pS]). the accord of the two sets of peaks is consistent with 
the intuition that, given the overall accord of the essen- 
tial modes, the highest values of Q should be observed in 
correspondence with regions of high mobility (where the 
norm of the essential modes concentrates). By the same 
token, one would have expected to observe a peak of the 
Q profile in correspondence of the mobile helix 02 and 
the nearby portions of the flanking strands and /3g. 
By contrast, however, the relative contribution of these 
regions to the RMSIP is small. This is therefore indica- 
tive of a poor consistency of the generalised direction of 
motion of this region in the two proteins of interest, thus 
confirming the findings of Munz et al. from a different 
dynamics-based perspective. 



D. Conservation of general dynamical patterns in 
protein families and superfamilies 

Besides the previous investigations that aimed at elu- 
cidating specific functionally-related aspects in different 
proteins by using dynamics-based alignment strategies, 
there have been a number of studies where more gen- 
eral dynamical properties were compared across various 
protein families and superfamilies. 

In recent years Echave and coworkers have carried out 
several such studies with the purpose of assessing the 
extent to which features such as mean-square fluctuation 
profiles and overall shape (amplitude modulation) of the 
essential modes have been evolutionarily conserved |911 - 



The first of such analyses was carried out for a set of 18 
members of the globin family[92j. The considered globins 
typically consisted of 130-150 amino acid and shared a 
structural core of 68 amino acids [102] . 

The comparison of Maguid et al. [32] was focused on 
the set of about 100 corresponding amino acids that were 
identified by the multiple (CLUSTAL |161| 1 sequence 
alignment of the 18 globins. 

The dynamics of the globins was next characterised 
by the mean-square fluctuation profiles and molecules' 
lowest energy modes which were computed using the 
isotropic Gaussian network model |8] ■ In this model, the 
presence of the heme group was not taken into account. 

The comparison of the the dynamics across the dif- 
ferent globins was carried out by measuring the linear 
correlation coefficient between the fluctuation amplitudes 
of corresponding amino acids or between their displace- 
ments in the top modes. For comparative purposes, the 
latter were reranked so as to have maximally compatible 
sets of first modes, second modes etc. across the globins. 
The main differences of this comparative strategy from 
the one described in section |IID| is that the dynamics of 
the corresponding amino acids is obtained by neglecting 
the effect of non-aligned amino acids (equivalent to omit- 
ting the second term in eq. [s]) and for the use of reranked 
top modes in place of identifying the most consistent di- 
rections in the linear space spanned by the top modes. 

After carrying out these comparative steps, Maguid et 
al. [HI] concluded that both the mean-square fluctua- 
tions and the shape (amplitude modulation) of the top 
reranked modes were highly consistent across the various 
members of the globin family. 

Building on this findings, Maguid et al. [911 ESI ex- 
tended the analysis to a a comprehensive set of ^ 1000 
protein entries from several hundred families superfam- 
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ilies of the HOMSTRAD database [TOTl [Bl Hill- The 

studies followed the same comparative pathway out- 
lined above for the globins, with the significant modi- 
fications that corresponding amino acids were identified 
in pairwise MAMMOTH ^119) structural alignments and 
the anisotropic beta-Gaussian elastic network model was 
used in place of the isotropic one. Furthermore, the de- 
gree of collectivity of the modes was also assessed and 
compared. 

The studies of refs. fOTjlM] reported that the dynamical 
similarity (mean-square-fluctuation profiles, mode shape 
and mode collectivity) within members of the same fam- 
ily and superfamily was significantly larger compared to 
pairs of unrelated protein entries. In addition, the sim- 
ilarity within the same family was stronger than within 
the same superfamily. 

From this series of studies, Echave and coworkers con- 
cluded that general dynamical properties of proteins tend 
to be preserved in the course of evolution and are quan- 
titatively detectable. 



E. Conservation of specific functionally-oriented 
dynamics in enzymes 

In the recent study of ref. |137) Ramanathan et al. ad- 
dressed, by means of atomistic molecular dynamics simu- 
lations, the extent to which enzymes with the same func- 
tion but different degree of homology rely on the same 
functionally-oriented dynamics. 

The study considered a few members for each of three 
different types of enzymes: the CypA peptidyl-prolyl iso- 
merase, the DHFR oxidoreductase and ribonuclease A 
(RNaseA). 

For each member, extensive molecular dynamics sim- 
ulations were carried out. The authors next compared 
the dynamics-based features that directly impacted the 
known rate-limiting step of the enzyme catalytic activity. 
This important technical step allowed Ramanathan et al. 
to address in a direct and precise way the functionally- 
oriented dynamical aspects of the proteins without re- 
lying on their dynamics-based alignment or considering 
general aspects of the internal dynamics that are incon- 
sequential for biological functionality ^37j. 

By these means Ramanathan et al. ascertained that 
the reaction-coupled motions of the members of each of 
the three types of enzymes were highly similar. Because 
the members were picked from different species it was 
further concluded that the detailed functionally-oriented 
dynamical aspects have been evolutionarily conserved. 

The analysis established two further notable features. 
First, the dynamical similarities found for the homol- 
ogous CypA entries were found to extend to the non- 
homologous PINl peptidyl-prolyl isomerase. In con- 
sideration of the structural differences of the modelled 
structure of Pinl and CypA it was concluded that the 
reaction-coupled motions of the enzymes were conserved 
despite the structural differences. Secondly, it was ob- 



served that the dynamical aspects influencing the func- 
tional activity involved regions that are not necessarily 
near the active site, thus pointing out at an overall in- 
terplay of local and global aspects in the functional "me- 
chanics" of the enzymes. The fact that these features 
might hold for several other enzymes is reinforced by the 
consistency with the findings reported earlier for mem- 
bers of the proteases family as well as by instances such 
as R67 dihydrofolate reductase where enzyme flexibility 
has been argued to impact the catalysed reaction [74] . 



F. Comparison of general dynamical patterns in 
members of the SCOP database 

Besides the above-mentioned studies, a comparative 
investigation of mean-square fluctuation profiles and 
mode shapes was recently undertaken by Tobi [163] 
for an extensive set of entries from the SCOP/ Astral 
database O |23]. A distinctive point of the analysis of 
ref. |163) is the fact that the set of amino acids over which 
the dynamical properties are automatically compared is 
not identified by sequence or structural alignments, but 
by matching the fiuctuation (or mode) amplitude profile 
itself, as first envisaged by Keskin et aL[75] 

A key ingredient of this comparative approach is the 
use of the isotropic Gaussian network model[5|. Be- 
cause this phenomenological model does not possess 
the full rotational-translational invariance of the three- 
dimensional elastic networks, its essential dynamical 
spaces have a one-dimensional character. By restricting 
considerations to the one-dimensional profile of a single 
mode (or of the mean-square fiuctuation) Tobi used a 
dynamics-based programming strategy to identify corre- 
sponding amino acids for various pairs of proteins. 

Significant matches were reported for pairs of proteins 
with different overall structural organization. Consis- 
tently with the isotropic character of the elastic network 
model, the lowest energy mode of these matching proteins 
typically exhibited a single node located aproximately in 
the middle of the matching subchain, thus entailing a 
hinge-bending motion. This motion was prototypically 
illustrated in ref. '163' for two pairs of entries: OPRTase 
(PDBid, ls7o chain A) with Mediator complex subunit 21 
(PDBid lykh chain A fragment 111-205) and Baseplate 
wedge protein 9 (PDBid, ls2e chain A) with transcar- 
boxylase (PDBid Irqh chain A fragment 307-474). 

Notably, the former of these two pairs has also a signif- 
icant dynamics-based alignment according to the scheme 
of Zen. et al. which employs a three-dimensional elastic 
network model as well as the integration of the dynamics 
of non-corresponding amino acids). The corresponding 
Aladyn alignment is shown in Fig. |8] 
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FIG. 8: Dynamics-based alignment of two OPRTase, (PDBid: ls7o chain A) and Mediator complex subunit 21 (PDBid; lykh 
chain A) discussed in ref. [163| . The structural superposition of the aligned regions (ribbons) and three best-matching modes 
(arrows) are shown in panel a and b, respectively. Aligned elements of OPRTase are shown in blue, while those of Mediator 
comples subunit 21 are shown in red. 



G. Comparison of the structural variability in a 
protein superfamily with the internal dynamics of 
its members 

An interesting problem regards the extent to which 
evolutionary conformational drifts observed in proteins 
superfamilies occurs along the essential dynamical spaces 
of the family members. 

This question was first posed by Leo-Macias et al. (82] 
who considered 35 representative protein families. For 
each family, the members were first structurally aligned 
to identify the common core and then a principal com- 
ponent analysis was carried our to obtain the main de- 
formation modes. The latter were finally compared with 
the essential dynamical spaces obtained from elastic net- 
work models. The comparison of the two sets of spaces, 
which nowadays can be largely automated with the aid 
of bioinformatic tools such as ProDy^, indicated a good 
mutual consistency. 

The investigation of Leo-Macias et al. was recently ex- 
tended by Velazquez-Muriel et al. [167] who considered a 
larger set of 55 families and used atomistic MD simula- 
tions. This study reported that the conformational space 
explored in MD simulations at constant-temperature has 
a smaller breadth than that spanned by known members 
of the same superfamily. However, the complexity of the 
explored space is significantly larger for MD simulations 
than for the internal variability of protein superfamilies. 
In this study the complexity was defined and measured 
as the minimal number of essential modes required to 
account for the same fraction of the global mean-square 
fluctuation of the superfamily or MD trajectory. 

Based on these findings, Velazquez-Muriel et al. |167| 
concluded that the structural evolution of superfamilies 
has occurred in diverse and much richer ways than those 
kinetically accessible in thermal equilibrium to any of 
the superfamily members. Yet, such enhanced confor- 
mational variability was constrained in fewer generalised 



directions, compared to those that are a priori kinetically 
accessible. 

These conclusions, in turn, prompted the specula- 
tion that the restrictions to the viable superfamily "con- 
formational complexity" reflect the evolutionary pres- 
sure to preserve certain patterns of structural fluctua- 
tions/motion that cannot be arbitrarily modified without 
compromising dynamics-based aspects relevant to func- 
tion. The effect was most evident for enzymes, where the 
largest restrictions of the conformational variability was 
observed |167| . 

The possibility that physics-based constraints may also 
promote the consistency of the evolutionary deformation 
modes and essential dynamical spaces was explored by 
Echave and coworkers in refs. [35l [36l 



H. 



Dynamics-based alignment of proteins with 
different structure and function 



We now report on the studies of Zen et al. jl77j who 
carried out comparisons of the internal dynamics of a 
comprehensive set of 76 enzymes covering the six main 
functional groups (oxydoreductases, transferases, hydro- 
lases, lyases, ligases). 

The analysis of Zen et al. was aimed at ascertaining 
whether similar functionally-oriented dynamical proper- 
ties (arising from either evolutionary conservation or con- 
vergence) could be found in enzymes with major sequence 
and structure differences. 

The study entailed the dynamics-based alignment (in 
the spirit of section 11 G 1 1 of all the possible pairings of 
such enzymes. About 30 of such pairings were singled 
out as being outstanding for statistical significance. Two 
thirds of such pairings involved enzymes with detectable 
sequence homology or structural similarity as resulting by 
global or partial structural superposition using the DALl 
alignment program. One such example is offered by the 
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pair lyb7-2had which share the full CATH code, despite 
the different function. The dynamics-based alignment 
of this pair is shown in Fig. [9^ where one can observe 
the remarkable structural superposition of the molecules' 
active sites. 

Interestingly, the remaining third of the significant 
pairings involved entries whose structural relatedness was 
not significant by standard alignment criteria and occa- 
sionally involved enzymes with different function, i.e. dif- 
ferent primary Enzyme Commission (EC) number. 

Two such pairs are respectively, ldy4-2ayh and lako- 
ld7o, which are respectively shown in panels b and c of 
Fig. |9] It is seen that while the overall structural corre- 
spondence is limited (and in fact aligned regions can have 
different secondary structure content), the alignment re- 
flects a very good consistency of the matching modes as 
well as the superposition of the known active regions. 

As for the previously discussed case of proteases, the 
match of the latter and the fact that the matching modes 
entail the modulation of the region surrounding the ac- 
tive site, support the notion that common functionally- 
oriented dynamics-based properties can be detected in 
proteins that possibly differ by structure and even de- 
tailed catalytic chemistrv [551 1137j . 



I. Comparing large-scale movements of 
multidomain proteins 

As anticipated at the end of section |II A[ a particu- 
larly challenging case for characterizing protein internal 
dynamics, as well as comparing it, is represented by pro- 
teins comprising mobile domains. 

For such molecules, in fact, the relative displacements 
of the mobile domains can be so large that the motion 
is only poorly described by linearly superimposing a few 
essential modes onto a reference structure, see Fig. 3 in 
ref. |151| . A familiar example is offered by the open- 
ing of a door: the larger the opening angle, the poorer 
the directional consistency of the initial displacement of 
the door's edge and the difference vector of the initial 
and final edge positions. As a consequence, the essen- 
tial dynamical spaces calculated for a short trajectory, or 
by applying elastic network models on a specific protein 
conformer, can only limitedly capture and describe large- 
amplitude motions in such complexes. Furthermore, the 
very same calculation of essential dynamical spaces from 
extensive MD simulations can be problematic because 
they rely on the use of rigid-structural alignments which 
cannot well superimpose the visited conformers over all 
their amino acids. 

At least for some proteins with mobile subdomains, us- 
ing internal angular coordinates instead of Cartesian dis- 
placements can provide a viable alternative for describing 
the large-amplitude protein motion [89l [97l fTTS] . 

The fact that suitably-defined angular coordinates can 
be used for comparing the dynamics of proteins articu- 
lated in several domains was recently illustrated by Morra 




FIG. 9: Examples of significant dynamics-based alignments 
of proteins with different degree of structural and functional 
similarities (captured by the CATH code and primary EC 
number, respectively). The examples are taken from ref. |177| 
and the alignments were produced with the Aladyn web- 
server. The aligned proteins in panel (a) have the same fold 
(they share the full oath code) but have different function. 
The pair in panel (b) have the same function but different 
CATH architecture. The pair in panel (c) differ by CATH 
architecture and function. The pair in panel (a) involves a 
haloalkane dehalogenase (PDBid 2had, CATH: 3.40.50.1820, 
EC: 4) and a (s)-acetone-cyanohydrin lyase (PDBid: lyb7, 
CATH: 3.40.50.1820, EC: 3). The pair in panel 9b) involves a 
Cellobiohydrolase i (PDBid: ldy4, CATH: 2.70.100.10, EC: 3) 
and a glucanase (PDBid: 2ayh, CATH: 2.60.120.200, EC: 3). 
The pair in panel (c) involves an exonuclease (PDBid: lako, 
CATH: 3.60.10.10, EC: 3) and an Enoyl-reductase (PDBid: 
ld7o, CATH: 3.40.50.720, EC: 1). For each pair we report 
separately the structural superposition of the aligned regions 
(ribbons) and of the top three best-matching modes (arrows). 
Aligned elements are shown in blue for the first entry of the 
pair and in red for the second. The active sites are shown 
in cyan and pink for the first and second entry of the pair, 
respectively. 
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et al. in ref. |108j . This study considered three homod- 
imeric multidomain HSP90 chaperones, namely mam- 
malian Grp94, yeast Hsp90 and E.coli HtpG. The three 
chaperones, which are represented in Fig. |10[ have a mu- 
tual sequence identity of ~45% and most of their amino 
acids can be put into one-to-one correspondence by using 
flexible structural alignment |174j . 

The internal dynamics of the chaperones was char- 
acterised by extensive molecular dynamics simulations 
started from different initial conformers which differed 
by the presence and type of bound ligand. Next, to ex- 
tract the large-scale dynamical features that are shared 
by the chaperones, considerations were restricted to the 
extensive set of corresponding amino acids. 

The motion of such set was found to be well approxi- 
mated by the relative rigid-like movements of three quasi- 
rigid domains (similar, but not equal, to the structural 
ones). As a matter of fact, for all three chaperones it 
was possible to identify two consensus hinges and axes of 
motion controlling the rotation of the side-domains rel- 
ative to the core of each protomer. Notably, one of the 
hinges (the one at the boundary of the N-terminal and 
Middle domain) occurs in correspondence of a site that 
had been previously shown to be important to chaperone 
functionality. In fact, it was validated as a as a poten- 
tial target for HSP90 inhibition [TMl [TBS]. Based on the 
detailed analysis of the same simulations carried out in 
ref. |108j it was further concluded that an analogous role 
could be played by the site accommodating the second 
hinge. 

The study of ref. |108j therefore suggests that compar- 
ative dynamical analysis based on quasi-rigid protein do- 
main movements could represent a promising avenue for 
identifying functional relationships in multidomain pro- 
teins and possibly protein complexes too. 



J. A dynamics- based metric for protein space 

We conclude the overview by reporting on the recent 
work of Hensen et al. [SS] who considered a set of ~ 100 
proteins covering the main known folds and compared 
their structural features and especially a comprehensive 
series of dynamical observables calculated from 100-ns 
long atomistic MD simulations. In particular, to each 
protein entry, Hensen et al. associated a dynamical "fin- 
gerprint" consisting of a multidimensional array whose 
components were dynamics-based scalars. These scalar 
quantities included the spread of the essential dynam- 
ics eivenvalues, the roughness of the free energy land- 
scape, the root-mean-square-deviation from the crystal- 
lographic structure, the root-mean-square fluctuations 
from the average structure etc. 

At variance with the studies mentioned earlier, which 
aimed at detecting detailed dynamical correspondences 
among proteins, the investigation of ref. |58j was mostly 
targeted to establishing the overall features of the space 
spanned by the dynamical fingerprints. In particular. 



Hensen et al. meant to introduce a dynamics-based met- 
ric to explore the occupation of the fingerprint space 
(termed the "dynasome space") and understand e.g. 
whether structurally or functionally similar proteins can 
be clustered. 

From this survey, the authors concluded that in the 
considered dynamical space, proteins are not partitioned 
in distinct clusters but are distributed rather continu- 
ously. This interesting aspect therefore parallels the find- 
ings of recent studies which support the view that struc- 
tural properties cover a continuum rather than a discrete 
succession of conformers [HH [1491 IT721IT80] . 

The analysis has further revealed the strong connection 
between dynamical and structural similarities, consis- 
tently with the studies, mentioned earlier in this review, 
where the structural relatedness has been frequently as- 
sociated to strong dynamical implications. 

It is interesting to observe that, as in the study of Zen 
et al. [177] described in the previous section, the analy- 
sis of Hensen et al. [58: has highlighted the possible ex- 
istence of appreciable dynamical similarities in proteins 
with limited structural relatedness. The example offered 
by the authors pertained to the pairing of two hydro- 
lases, serralysin and rhizopuspepsin (PDB codes Isat and 
2apr). Their structural alignment is non-significant ac- 
cording to DALI statistical criteria while in the dynamic 
metric space considered by the authors they have a strong 
dynamical proximity. Consistently with this finding the 
Aladyn alignment of this pair, which involves 79 amino 
acids) is statistically significant too as the observed RM- 
SIP=0.66 and the associated p-value is 0.025. 

Finally, by examining the dynamic fingerprint of 
functionally-related proteins Hensen et al. 58 concluded 
that it ought to be possible to reliably establish and as- 
sign proteins function based on their neighbours in the 
metric dynamic space. Indeed, the possibility to carry 
out functional assignments on the basis of dynamics- 
based data represents a very interesting avenue with sev- 
eral practical ramifications. 

As a related issue we report that pairwise dynamics- 
based alignments have been previously carried out with 
the purpose of predicting the active site of proteins for 
which standard homology-based approaches are not ap- 
plicable. In particular, this approach was undertaken to 
predict the nucleic-acids binding sites of proteins adopt- 
ing non-canonical OB-folds, as discussed in ref. [178j . 



IV. CONCLUSIONS 

Over the past decades, several bioinformatics tools and 
computational methods have been introduced and sys- 
tematically applied to clarify aspects of the relationship 
between structure and dynamics for protein and enzymes. 

Many such studies contributed to clarifying how the 
interplay of structure and internal dynamics of various 
proteins impacts their biological functionality. The lat- 
ter, in fact, is often - though not always - associated 




FIG. 10: Crystallographic structures of three HSP90 conformers used in the comparative dynamics study of ref. 1108] . The 
structures correspond to: (A) canine ATP-bound Grp94 structure, PDBid: 2olu; (B) yeast ATP-bound Hsp90 structure, 
PDBid: 2cg9; (C) HtpG structure, PDBid 2iop.pdb. Different colors are used to highlight the various structural subdomains: 
blue, N-terminal domains; Red, M-large domains; Orange, M-small domains; Yellow, C-terminal domains. Reproduced from 
Fig. 1 of ref. [I08] . 



with the innate capability of these biomolecules to sus- 
tain concerted, large-scale conformational changes so to 
bind ligands, change oligomeric state etc. 

In recent years, besides the well-established approach 
of dissecting such properties for specific, individual pro- 
teins and enzymes, there has been a growing interest for 
comparative studies of proteins' internal dynamics. 

In such studies, covered by this review, the key 
dynamics-based properties of proteins are singled out by 
identifying those features (such as essential dynamical 
spaces, mean square fluctuation profiles, relaxation times 
etc.) that are shared by proteins with different degrees 
of sequence, structure and functional similarities. 

Such comparative investigations have been carried out 
with two main purposes: characterizing functionally- 
oriented mechanisms for specific groups of proteins and 
understanding the more general organization of the "pro- 
tein universe" by complementing the sequence- and struc- 
tural prespectives with a dynamics-based one. 

For the first objective, detailed comparative tools have 
been developed, including the so-called dynamics-based 
alignments which use dynamics-based properties to es- 
tablish one-to-one correspondences of amino-acids in dif- 
ferent proteins. These strategies have been used to iden- 
tify common hinge-bending motions in multi-domain pro- 
teins, to complement sequence- and structural-alignment 
in singling out functionally-relevant regions in proteins 
with different degrees of homologies, and to highlight 
common large-scale movements in proteins that differ sig- 
nificantly by fold and/or function. 

The latter aspect, is tighly connected to the sec- 
ond objective, namely the development and use of 
dynamics-based criteria to trace elusive evolutionary re- 
lationships and group/classify proteins by their internal 



dvnamics |171 [5U1 151| . This perspective has been pursued 
so far to highlight the degree of conservation of the am- 
plitude of amino acid fluctuations in protein families and 
superfamilies, to clarify the extent to which the structural 
variations accumulated within protein superfamilies have 
occurred along the "innate" directions of structural fluc- 
tuations of its members, and even to introduce a metric 
to quantify how evenly are proteins distributed in a gen- 
eralized dynamics-space. The latter prespective can have 
important implications for functional assignment. 

In conclusion, the valuable findings provided by the 
recent introduction of methods for comparing detailed 
or general dynamical properties of proteins suggest that 
they could be profitably used in conjuction with classic 
comparative methods to characterize proteins at the vari- 
ous steps of the sequence — >■ structure — > function ladder. 

Arguably, the progress towards this goal would be 
greatly aided by the development of unsupervised meth- 
ods to single out those dynamical features that are more 
likely attributed to the biological functionality of a given 
protein and by the more systematic investigation of evo- 
lutionary relationships from a detailed dynamics-based 
perspective. 
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